19 research outputs found

    Comparative analysis of the transcriptome across distant species

    Get PDF
    The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters

    Analysis of Nearly One Thousand Mammalian Mirtrons Reveals Novel Features of Dicer Substrates

    No full text
    <div><p>Mirtrons are microRNA (miRNA) substrates that utilize the splicing machinery to bypass the necessity of Drosha cleavage for their biogenesis. Expanding our recent efforts for mammalian mirtron annotation, we use meta-analysis of aggregate datasets to identify ~500 novel mouse and human introns that confidently generate diced small RNA duplexes. These comprise nearly 1000 total loci distributed in four splicing-mediated biogenesis subclasses, with 5'-tailed mirtrons as, by far, the dominant subtype. Thus, mirtrons surprisingly comprise a substantial fraction of endogenous Dicer substrates in mammalian genomes. Although mirtron-derived small RNAs exhibit overall expression correlation with their host mRNAs, we observe a subset with substantial differences that suggest regulated processing or accumulation. We identify characteristic sequence, length, and structural features of mirtron loci that distinguish them from bulk introns, and find that mirtrons preferentially emerge from genes with larger numbers of introns. While mirtrons generate miRNA-class regulatory RNAs, we also find that mirtrons exhibit many features that distinguish them from canonical miRNAs. We observe that conventional mirtron hairpins are substantially longer than Drosha-generated pre-miRNAs, indicating that the characteristic length of canonical pre-miRNAs is not a general feature of Dicer substrate hairpins. In addition, mammalian mirtrons exhibit unique patterns of ordered 5' and 3' heterogeneity, which reveal hidden complexity in miRNA processing pathways. These include broad 3'-uridylation of mirtron hairpins, atypically heterogeneous 5' termini that may result from exonucleolytic processing, and occasionally robust decapitation of the 5' guanine (G) of mirtron-5p species defined by splicing. Altogether, this study reveals that this extensive class of non-canonical miRNA bears a multitude of characteristic properties, many of which raise general mechanistic questions regarding the processing of endogenous hairpin transcripts.</p></div

    Examples of novel mirtrons confidently annotated in this study.

    No full text
    <p>(A) Drosha-mediated and splicing-mediated pathways for the generation of Dicer-substrate pre-miRNA hairpins. For both canonical miRNA loci and mirtron loci of all four classes, the critical judgement for their annotation is whether the small RNA evidence supports the notion that their progenitor hairpins were subject to Dicer cleavage to generate specific miRNA/miRNA* (star) duplexes. The thickness of the arrows leading to Dicer indicates the relative number of substrates generated by each pathway. (B) Mirtron supported by especially robust evidence, including the presence of both miRNA and star reads in Ago-IP data and the recovery of abundant phased loop reads. Note that loop reads were not found in the Ago-IP data, providing evidence for the selection of specific species in Ago complexes. (C) A mirtron locus lacking substantial "star" read evidence, whose confidence is bolstered by hundreds of reads in Ago-IP libraries. (D) Example of a skin-restricted mirtron whose reads are highly represented in Ago1-IP, Ago2-IP and Ago3-IP data. (E) Example of a heart-restricted mirtron. Over 96% of reads collected from nearly 700 human small RNA datasets were from heart. (F) Example of a "two-tailed" mirtron supported by abundant Ago-IP reads. We infer that generation of the pre-miRNA involves splicing followed by removal of both 5' and 3' tails (see A).</p

    Broader distribution of hairpin lengths in mirtrons vs. canonical miRNAs.

    No full text
    <p>(A) Example of a conventional mirtron with extremely long pre-miRNA hairpin. (B) Example of two-tailed mirtron with extremely long pre-miRNA hairpin. In both cases, small RNA reads were recovered specifically from the genomically distant miRNA/star duplexes. (C) Analysis of mouse pre-miRNA lengths. The left plot illustrates individual loci, while the right plot summarizes their overall behavior. Canonical miRNAs exhibit a very tight distribution with no pre-miRNAs greater than 82 nt. The average lengths of 3'-tailed, 5'-tailed, and two-tailed mirtron pre-miRNAs are similar to canonical pre-miRNAs, but 5'-tailed mirtrons exhibit a noticeably broader length distribution. Conventional mirtrons exhibit noticeably longer pre-miRNA lengths than the other classes. (D) Analysis of human pre-miRNA lengths. Their overall properties are similar to mouse loci, including the subpopulation of long 5'-tailed mirtron hairpins and the substantially increased length of conventional mirtron pre-miRNAs as a class.</p

    Sequence and length properties of mirtron-containing introns.

    No full text
    <p>(A) Comparison of mirtron-bearing introns with total introns in human. The distribution of total intron lengths is much broader than for mintrons. The dominant class of 5' tailed mirtrons derives mostly from introns that are <3kb in length, while the 3'-tailed mirtrons and conventional mirtrons derive from very short introns. (B, C) Nucleotide bias of small RNAs from 5'-tailed mirtrons. Three anchor points were considered, as schematized on the 5'-tailed mirtron model in the center (1, 2, 3, arrows). (B) Biased nucleotide identities of mirtron-5p reads from the dominant class of 5'-tailed mirtrons. Compared to an equivalent sequence range of control introns of similar length, mirtron-5p reads exhibit substantial 5'-U bias and overall enrichment of G across their lengths. The G bias is greater in the 5' than 3' regions of the mirtron-5p reads, and is not evident in bulk intron sequences downstream of their ~22 nt lengths. (C) Biased nucleotide identities of mirtron-3p reads from the dominant class of 5'-tailed mirtrons. Compared to control introns, there is substantial 5'-U bias (evident with aligning by their 5' ends) and substantial C-bias across their length. Note that the bulk introns exhibit polypyrimidine tracts upstream of the splice acceptor site (YAG), but mirtrons exhibit greater representation of C while control introns show greater bias for U. (D) Mirtronic regions exhibit much lower minimum free energy (MFE) than control intronic regions. CDF (cumulative distribution function) is plotted for MFE/base distribution. (E) All four classes of mirtrons are hosted by genes with greater numbers of introns than average genes. Various classes of other intronic non-coding RNAs (e.g. tRNAs, snoRNAs, and either conserved or non-conserved canonical miRNAs) typically reside in genes with larger numbers of introns than bulk genes, but their averages are intermediate to all classes of mirtrons. (F) Bar graphs that emphasize the individual properties of genes that host various classes of non-coding RNAs. It is evident that the all four classes of mirtrons have a broader distribution of intron numbers relative to other types of non-coding RNAs.</p

    Distinct patterns of terminal heterogeneity in mirtron-derived small RNAs.

    No full text
    <p>(A, B) 5'-end heterogeneity in the 5p and 3p reads from human (A) and mouse (B) miRNA loci. There are several distinctions between canonical miRNAs and specific classes of mirtrons. These include substantial populations of mirtron-5p reads that lack their 5' nucleotide defined by splice donor sites, namely 5p reads from conventional mirtrons and 3' tailed mirtrons (*), and overall greater 5' heterogeneity in the 5' reads from 5' tailed mirtrons (#). (C, D) 3'-end heterogeneity in the 5p and 3p reads from human (C) and mouse (D) miRNA loci. Particularly notable are the dominant populations of 3'-tailed reads from 3p arms of conventional mirtrons and 5'-tailed mirtrons (marked by + signs), i.e., those reads that are defined by splice acceptor sites.</p

    Correlation of mirtron and host gene expression.

    No full text
    <p>(A) We calculated the Pearson correlation coefficients of the accumulation of mouse mirtron-derived small RNAs and spliced RNA-seq reads directly flanking the mirtron across seven tissues. We also performed 100 control comparisons where the tissue origins were shuffled. The cumulative distribution function (CDF) of these correlations was plotted, and observed to be significantly positively correlated (by Mann-Whitney U-test). (B) The binned distribution of mirtron/mRNA Pearson correlation coefficients was plotted. This visualization emphasizes their positive correlation, but also highlights a subset of discordant loci. (C) Examples of correlated and discordant expression of mirtron-derived miRNAs and host mRNAs across tissues. We show host level gene expression as reads per kilobase of transcript per million mapped reads (RPKM) and the spliced exonic reads that directly cross the mirtronic locus as reads per million mapped reads (RPM). Mirtron-derived miRNAs are quantified as reads per million mapped miRNA reads (RPMM).</p

    Unexpected patterns of 5' heterogeneity and processing of mirtrons.

    No full text
    <p>(A) A 5'-tailed mirtron in <i>Irak1</i> exhibits strong heterogeneity in its 5p species that differ in register by 2 nt; this is accompanied by strong heterogeneity in its 3p species. Inspection of this array of isomiR sequences suggests that distinct 5' ends of 5p species may instruct alternative Dicer cleavage. This may be accompanied by subsequent 3' resection of "long" 3p reads produced by Dicer cleavage closer to the terminal loop, and retention of 3'-uridylated 3p reads when the Dicer cleavage is further from the loop. (B) A counter-example in which broad 5' heterogeneity of 5p species, here distributed equally over three nucleotides, is not accompanied by 5’ heterogeneity of 3p species, which are extremely precisely-defined. All of these reads accumulate similarly in total and Ago-IP data. (C-E) Frequent 5' decapitation of select mirtron-5p reads defined by splicing. (C) Example of 3'-tailed mirtron (<i>hsa-mir-4745</i>) exhibiting nearly complete decapitation of 5'-G from its 5p reads (i.e., "xU" reads). These reads are present in multiple Ago-IP datasets. (D) Example of a conventional mirtron (<i>hsa-mir-1236</i>) exhibiting high frequency "xU" reads supported by Ago-IP evidence. (E) Summary of 5' heterogeneity amongst 5p reads from human conventional and 3'-tailed mirtrons, indicating that many mirtrons are subject to 5' decapitation.</p

    Greatly expanded annotations of human and mouse mirtrons.

    No full text
    <p>(A) Numbers of splicing-derived miRNAs in human and mouse, categorized as conventional, 5'-tailed, 3'-tailed, and two-tailed mirtrons. Most of the miRNAs newly annotated in this study were 5'-tailed mirtrons, reflecting their status as the dominant mirtron class in human and mouse. (B) Few mirtrons were annotated from small RNA data in both mouse and human, and only a subset of these were constrained in primary sequence. (C, D) Human and mouse mirtrons are generally modestly expressed, but were annotated to higher levels of evidence than hundreds of human and mouse miRNAs in the miRBase registry (i.e. that have <50 reads in the aggregate data analyzed in this study). Most mirtrons were supported by evidence from Ago-IP datasets (red bars). (E, F) Cumulative distribution function (CDF) plots of enrichment of canonical miRNAs and mirtron-derived miRNAs in Ago complexes. (E) Analysis of human small RNAs. Rat RmC was used as control IP; since Ago4 is not expressed in HeLa cells, it effectively serves as another control IP. Canonical miRNAs were enriched in Ago1-3-IP data as well as input RNA (which is mostly composed of Ago-bound miRNAs), relative to control IP data. Mirtron-derived small RNAs showed similar Ago-IP enrichment, except that they also exhibited enrichment between Ago1-3-IP and input RNA libraries. (F) Analysis of mouse small RNAs shows similar enrichment of canonical miRNAs and mirtron-derived small RNAs in Ago1 and Ago2 complexes relative to control IgG complex.</p
    corecore